As a genetics-based machine learning technique, zeroth-level classifiersystem (ZCS) is based on a discounted reward reinforcement learning algorithm,bucket-brigade algorithm, which optimizes the discounted total reward receivedby an agent but is not suitable for all multi-step problems, especiallylarge-size ones. There are some undiscounted reinforcement learning methodsavailable, such as R-learning, which optimize the average reward per time step.In this paper, R-learning is used as the reinforcement learning employed byZCS, to replace its discounted reward reinforcement learning approach, andtournament selection is used to replace roulette wheel selection in ZCS. Themodification results in classifier systems that can support long action chains,and thus is able to solve large multi-step problems.
展开▼